Diachronic semantic cohesion for topic segmentation of TV broadcast news
نویسندگان
چکیده
This paper proposes a new way to integrate semantic relations into a topic segmentation process by defining the notion of semantic cohesion. In the context of a sliding window based automatic topic segmentation algorithm, semantic relations are incorporated in the similarity measure between adjacent blocs. Additionaly, in the context of TV Brodcast News topic segmentation, we propose a new protocole to gather relevant data for semantic relations computation, showing that a small set of diachronic data can be more relevant for the task than using a large amount of general or asynchronous data. Experiments on a corpus of 86 various French TV Broadcast News shows recorded during one week, in conjunction with text articles collected through the Google News homepage at the same period for semantic relation estimation show significant improvement in topic segmentation performance.
منابع مشابه
Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation
Transcript-based topic segmentation of TV programs faces several difficulties arising from transcription errors, from the presence of potentially short segments and from the limited number of word repetitions to enforce lexical cohesion, i.e., lexical relations that exist within a text to provide a certain unity. To overcome these problems, we extend a probabilistic measure of lexical cohesion ...
متن کاملSpeech cohesion for topic segmentation of spoken contents
In this paper, we introduce the notion of speech cohesion for topic segmentation of a spoken content. The aim is to integrate speaker information and lexical information within a single cohesion value. Based on a lexical cohesion system, we propose an approach that directly integrates the speaker distribution when processing the cohesion. A potential boundary is effective if the joint distribut...
متن کاملThe need to create a media block for the convergence of overseas news networks
As a general diplomacy arm of the Islamic Republic of Iran, VoSiMa has extensive activities in international broadcasting of its radio and television programs. These programs are broadcast in different languages, such as English, French, Azeri, Arabic, and ... for regional and transnational audiences. The large volume of the organization's international activities is in the form of news and new...
متن کاملHow Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News
Out-Of-Vocabulary (OOV) words missed by Large Vocabulary Continuous Speech Recognition (LVCSR) systems can be recovered with the help of topic and semantic context of the OOV words captured from a diachronic text corpus. In this paper we investigate how the choice of documents for the diachronic text corpora affects the retrieval of OOV Proper Names (PNs) relevant to an audio document. We first...
متن کاملSegmentation and Indexation of Broadcast News
This paper describes a topic segmentation and indexation system for broadcast news that is integrated in an alert system for selective dissemination of multimedia information. The goal of this work is to enhance the retrieval and navigation through specific spoken audio segments that have been automatically transcribed, using speech recognition. Our segmentation algorithm is based on simple heu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015